October 29, 2020
Tag:
Today, we introduce WafaaWardah et al. of the University of the South Pacific, who published an article in the Journal of Economics, "Predicting Protein-Peptide Conjugates with Convolutional Neural Networks".Prediction of protein-peptide binding sites plays an important role in disease prevention and drug development.However, the existing prediction methods did not show good results in actual prediction, especially the sensitivity did not even reach 50%.This paper presents a method for predicting protein-peptide binding sites using CNN framework to process "visualized" protein feature data.The authors innovatively introduced the "sliding window method" to transform the initial protein feature data into "visualized" matrix information, and then input the CNN framework for training.Finally, the prediction results are output through fully connected network, and the Bayesian optimization method is embedded in the CNN framework to deal with the hyper-parameters, so that the model achieves excellent results on the test set.
I. Research background
Studying the interaction between protein and peptide is of great significance in the field of bioinformatics, and the interaction between protein molecule and peptide can be analyzed by studying its complex structure.But we know that such complex structures account for only a small fraction, and it is not only costly but also ineffective to analyze their interactions by doing biological experiments.Therefore, predicting the binding region of proteins and peptides by computer will bring great help to experimental research.
Existing methods for predicting protein-peptide binding sites perform well on experimental data sets, but perform poorly on the accuracy and sensitivity of binding residues in actual prediction.To address this problem, the authors used "visualization" within the CNN framework.
2. Models and methods
Fig. 1 Framework of protein-peptide binding site prediction model based on CNN
2.1 Feature selection and preprocessing.
In terms of feature selection, the authors used several groups of features with good discrimination in predicting protein-peptide binding sites, such as hemispherical structure (HSE), secondary structure (SS), auxiliary surface area (ASA), PSSM, etc.Then, the authors extended these sets of features into a set of numerical matrices ([1, 38]), representing all the above features with 38 values.
2.2 "Visualization" feature transformation.
In order to input the features of proteins into the CNN framework, it is necessary to transform their features into "visual" matrix information.The authors used the "sliding window" method to represent each residue in the protein chain as a sequence containing three neighbors on the left and three neighbors on the right with a fixed size window (in this paper, the size is 7, the window is similar to the matrix of [1, 7]), which is equivalent to representing one residue in the middle with a characteristic matrix of seven residues (each residue is represented by a matrix of [7, 38]).
Figure 2 Sliding window method
2.3 Model training.
The CNN framework generally includes two convolution layers, one convergence layer and one full connection layer.In the first layer of convolution, 256 [3, 3] convolution checks are used to convolute the "visualized" feature matrices ([7, 38]), resulting in 256 [5, 36] convolution feature matrices, which are then passed to the activation function CorrectedLineArunit (ReLu).There are 256 convolution cores in the second convolution layer, so 256 convolution feature matrices [4,35] are obtained after convolution, and the ReLU activation function is reused.The window size of [2,2] is adopted in the aggregation layer, and 256 aggregation feature matrices of [2,17] are obtained after aggregation.Finally, in order to input a fully connected network, 256 matrices of [2,17] are expanded into vectors of [1,8704], then input into a fully connected neural network, and finally a prediction vector of [1,2] is obtained to indicate whether the residues introduced into the CNN framework are binding residues.In addition, the authors use Bayesian optimization to optimize the hyper-parameters in the CNN framework, including the number of convolution cores in the two convolution layers and the learning rate used by the optimizer when updating the network weights.
Figure 3 Training process of "Visualization" feature data
3. Experimental results
3.1 Comparison of predicted results with actual results.
The method of predicting actual protein chains by processing the "visual" feature matrix with the deep CNN framework proposed by the authors is almost the same as that obtained by the experimental method, which shows that the method has better prediction effect on binding residues in practical use.In Figure 4 (a), the upper chain shows the binding residues in the actual protein chain, and the lower chain shows the predicted binding residues in the deep CNN framework; (b) and (c) are computer-generated protein binding site maps, (b) are actual binding site maps, and (c) are predicted binding site maps.Thus, this method can accurately cover the binding residue region.
Figure 4 Predicted and actual results
3.2 Compared with existing methods.
Table 1 compares the author's method with several top-ranked methods.From the AUC index, the author's method (visual) and SPRINT-Str method are much better than other methods. The AUC of SPRINT-Str method is higher than that of the author's method, but its detection rate (sensitivity) for binding residues is much lower than that of the author's method.
Table 1 Top 7 Contrast Charts
4. Summary
In this paper, the authors innovatively proposed a method to transform protein feature data into a "visual" feature matrix and then input it into a deep CNN framework for training, which achieved good results in the actual prediction of protein binding sites.Of course, at the end of the paper, the author also put forward some ideas about the possible improvement of this method.First, the performance can be improved by changing the order of the multiple sets of eigenvalues.Second, performance can be improved by increasing the size of sliding windows or adopting more complex network structures.These are all aspects that can be considered to improve this method, and the specific effect needs to be proved by experiments.
For any requests of Peptide for research purpose, welcome to contact us. www.gtpeptide.com , sales1@gotopbio.com.
Contact Us
Tel: (+86) 400 610 1188
WhatsApp/Telegram/Wechat: +86 13621645194
Follow Us: